What Is Claimed Is: 



1 1 . An apparatus that facilitates self-correcting memory in a shared- 

2 memory system, comprising: 

3 a main memory; 

4 a memory controller coupled to the main memory; 

5 a processor cache; 

6 a communication channel coupled to the processor cache and to the 

7 memory controller; 

8 an error detection and correction mechanism within the memory 

9 controller; and 

1 0 a reading mechanism within the memory controller that is configured to 

1 1 read a data from the processor cache when a currently valid copy of the data is 

12 checked out to the processor cache; 

13 wherein the error detection and correction mechanism corrects errors in 

14 the data and stores a corrected copy of the data in the main memory. 

1 2. The apparatus of claim 1 , wherein the error detection and 

2 correction mechanism performs single bit error correction/double bit error 

3 detection. 

1 3. The apparatus of claim 1, wherein the error detection and 

2 correction mechanism performs double bit error correction. 

1 4. The apparatus of claim 1, further comprising: 

2 an input/output cache; and 
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wherein the reading mechanism is further configured to read the data from 
the input/output cache when the currently valid copy of the data is checked out to 
the input/output cache; 

wherein the error detection and correction mechanism corrects errors in 
the data and stores the corrected copy of the data in the main memory. 

5. The apparatus of claim 4, further comprising: 
a second processor cache; 

wherein the reading mechanism is further configured to read the data from 
the second processor cache when the currently valid copy of the data is checked 
out to the second processor cache; and 

wherein the error detection and correction mechanism corrects errors in 
the data and stores the corrected copy of the data in the main memory. 

6. The apparatus of claim 5, further comprising a marking mechanism 
within the memory controller that is configured to mark a location in the main 
memory to indicate that the data from the location is checked-out to a cache, 
wherein the cache is one of, the processor cache, the input/output cache, and the 
second processor cache. 

7. The apparatus of claim 6, further comprising a scrubbing 
mechanism within the memory controller that is configured to access each 
location within the main memory periodically to allow the error detection and 
correction mechanism to detect and correct errors. 

8. The apparatus of claim 7, further comprising: 
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2 a detecting mechanism coupled to the scrubbing mechanism that is 

3 configured to detect the location in the main memory when the location is marked 

4 that the data from the location is checked-out to the cache; and 

5 the reading mechanism that is further configured to request a read from the 

6 communication channel if the location is so marked. 

1 9. The apparatus of claim 8, wherein the communication channel is a 

2 coherent network. 

1 1 0. A multiprocessor shared-memory computing system that facilitates 

2 self-correcting memory, comprising: 

3 a main memory; 

4 a memory controller coupled to the main memory; 

5 a processor cache; 

6 a central processing unit coupled to the processor cache; 

7 a communication channel coupled to the processor cache and to the 

8 memory controller; 

9 an error detection and correction mechanism within the memory 

10 controller; and 

1 1 a reading mechanism within the memory controller that is configured to 

12 read a data from the processor cache when a currently valid copy of the data is 

1 3 checked out to the processor cache; 

14 wherein the error detection and correction mechanism corrects errors in 

1 5 the data and stores a corrected copy of the data in the main memory. 
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1 11. The multiprocessor shared-memory computing system of claim 1 0, 

2 wherein the error detection and correction mechanism performs single bit error 

3 correction/double bit error detection. 

1 12. The multiprocessor shared-memory computing system of claim 1 0, 

2 wherein the error detection and correction mechanism performs double bit error 

3 correction. 

1 13. The multiprocessor shared-memory computing system of claim 1 0, 

2 further comprising: 

3 an input/output cache; 

4 an input/output device coupled to the input/output cache; and 

5 the reading mechanism further configured to read the data from the 

6 input/output cache when the currently valid copy of the data is checked out to the 

7 input/output cache; 

8 wherein the error detection and correction mechanism corrects errors in 

9 the data and stores the corrected copy of the data in the main memory. 

1 14. The multiprocessor shared-memory computing system of claim 1 3 , 

2 further comprising: 

3 a second processor cache; 

4 a second central processing unit coupled to the second processor cache; 

5 and 

6 the reading mechanism that is further configured to read the data from the 

7 second processor cache when the currently valid copy of the data is checked out to 

8 the second processor cache; 
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1 wherein the error detection and correction mechanism corrects errors in 

2 the data and stores the corrected copy of the data in the main memory.. 

1 15. The multiprocessor shared-memory computing system of claim 14, 

2 further comprising a marking mechanism within the memory controller that is 

3 configured to mark a location in the main memory to indicate that the data from 

4 the location is checked-out to a cache, wherein the cache is one of, the processor 

5 cache, the input/output cache, and the second processor cache. 

1 1 6. The multiprocessor shared-memory computing system of claim 15, 

2 further comprising a scrubbing mechanism within the memory controller that is 

3 configured to access each location within the main memory periodically to allow 

4 the error detection and correction mechanism to detect and correct errors. 

1 1 7. The multiprocessor shared-memory computing system of claim 1 6, 

2 further comprising: 

3 a detecting mechanism coupled to the scrubbing mechanism that is 

4 configured to detect the location in the main memory when the location is marked 

5 that the data from the location is checked-out to the cache; and 

6 the reading mechanism that is further configured to request a read from the 

7 communication channel if the location is so marked. 

1 18. The multiprocessor shared-memory computing system of claim 17, 

2 wherein the communication channel is a coherent network. 

1 19. A method for facilitating self-correcting memory in a shared 

2 memory system, comprising: 
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3 marking as invalid a memory location within a plurality of memory 

4 locations when a data from the memory location is checked out to a cache; 

5 scrubbing the plurality of memory locations for errors; and 

6 upon detecting the memory location marked as invalid; 

7 reading the data from the cache associated with the memory 

8 location, 

9 correcting an error in the data, and 

1 0 writing the data to the memory location. 

1 20. The method of claim 1 9, wherein scrubbing the plurality of 

2 memory locations for errors includes: 

3 accessing the memory location; 

4 locating a valid copy of the data associated with the memory location; 

5 reading the valid copy of the data; 

6 correcting an error in the data; 

7 writing the data to the memory location; and 

8 repeating the steps of accessing, locating, reading, correcting, and writing 

9 for each memory location within the plurality of memory locations. 
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